View AN1099_319451.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

(R)
APPLICATION NOTE
ST10 DSP MAC SIGNAL PROCESSING ALGORITHMS
The ST10 multiply-accumulate co-processor (MAC) performs common signal processing functions. The MAC carries out single-cycle instructions including 32-bit signed arithmetic (addition, subtraction, shift,...), 16 by 16-bit multiplication, and multiplication with cumulative subtraction/addition. The MAC includes the following components: 16 by 16 signed/unsigned parallel multiplier, scaler (one-bit left shifter), 40-bit signed arithmetic unit, 40-bit accumulator register, data limiter, 8-bit left/right shifter and a repeat unit. A full description of the MAC co-processor, including the registers and instruction summary is given in the ST10R262 datasheet and the ST10R262/272l User's Manuals. This application note describes how to use these common signal processing algorithms, using digital fi lters and matrix operations as examples. The sample codes contained in this application note can be cut and pasted into your application from the pdf document format..
dual-port data buses internal RAM
ST10 - DSP CPU
external memory
new addressing features IDX0 QX0 QR0 IDX1 QX1 QR1 program memory operands Peripheral interface control MAC CoProcessor 16 x16 multiplier 40-bit ALU shifter MCW MAL MRW MAH MSW repeat unit program code
40-bit accumulator
72 - TCH - 170 - 00
1/54
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Contents
1 2 2.1 2.2 2.3 2.4 3 3.1 3.2 4 4.1 4.2 4.3 4.4 5 5.1 5.2 6 6.1 6.2 6.3 7 Co-Processor Initialization - - - - - - - - - - - - - - - - - - 3 Mathematics - - - - - - - Double precision multiplication Nth order power series - - - [NxN][Nx1] matrix multiply - N-real multiply (windowing) - -4 -4 -5 -8 10
Fir Filter-Real Correlation-Convolution - - - - - - - - - - - 13 Simple precision FIR filter - - - - - - - - - - - - - - - - - - 13 Extended-precision FIR filter - - - - - - - - - - - - - - - - - 16 IIR Filters - - - - - - - - - - - - - - Nth Order IIR filter: direct form 1 - - - Nth Order IIR filter: direct form 2 - - - N-cascaded real biquads (direct form 2) N-cascaded real biquads: transpose form 21 21 25 30 34
LMS Adaptive Filter - - - - - - - - - - - - - - - - - - - - - 40 Single-precision LMS adaptive filter - - - - - - - - - - - - - - 40 Extended-precision LMS adaptive filter - - - - - - - - - - - - 45 Operations on Tables - - - - - - - - - - Table move - - - - - - - - - - - - - - - Find the index of a maximum value in a table Compare for search - - - - - - - - - - - 50 50 50 52
Summary of Routines - - - - - - - - - - - - - - - - - - - - 53
2/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
1
Co-Processor Initialization
; ; Control Registers Initialization. ; MOV MOV ; ; MOV MCW, MRW, #mcw #mrw
This routine initializes the co-processor registers:
; (MCW) mcw. ; (MRW) mrw.
; Accumulator Initialization. MAH, #data16
; (MAH) #data16, ; (MAE) 8 times (MAH15), ; (MAL) 0000H.
; ; Core SFRs Initialization. ; EXTR MOV MOV MOV MOV MOV MOV #6 IDX0, IDX1, QX0, QX1, QR0, QR1, #idx0 #idx1 #qx0 #qx1 #qr0 #qr1
; Next 6 instructions will utilize the ESFR space. ; (IDX0) idx0. ; (IDX1) idx1. ; (QX0) qx0. ; (QX1) qx1. ; (QR0) qr0. ; (QR1) qr1.
Program Words 19
Instruction Cycles Total 10
72 - TCH - 170 - 00
3/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
2
2.1
Mathematics
Double precision multiplication
This routine assumes that:
* XL (LSW) and XH (MSW) are stored in R0 and R1, respectively. * YL (LSW) and YH (MSW) are stored in R2 and R3, respectively. * MP and MS are cleared. * t performs P=X*Y.
After computation, the 64-bit product P is stored in R4-R7, where R7 contains the most signifi cant word and R4 the least significant word.
; ; XL*YL multiplication (unsigned) ; CoMULu CoSTORE ; ; XL*YH multiplication (unsigned/signed) and XH*YL multiplication (signed/unsigned). ; CoSHR CoSHR CoMACus CoMACsu CoSTORE ; ; XHL*YH multiplication (signed/signed) ; CoASHR CoASHR CoMAC CoSTORE CoSTORE #8 #8 R1, R5, R5, R3 MAL MAH #8 #8 R0, R1, R5, R3 R2 MAL R0, R4, R2 MAL
; (ACC) XL*YL. ; (R4) (ACC)L.
; (ACC) (ACC) >> 8. ; (ACC) (ACC) >> 8. ; (ACC) (ACC) + XL*YH . ; (ACC) (ACC) + XH*YL. ; (R5) (ACC)L.
; (ACC) (ACC) >>a 8. ; (ACC) (ACC) >>a 8. ; (ACC) (ACC) + XH*YH. ; (R6) (ACC)L. ; (R7) (ACC)H.
4/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Instruction Cycles Total 12
Program Words 24
2.2
Nth order power series
n
The formula is:
y=
i=0
a ( i ) xi = [ [ [ [ a ( n) x + a( n - 1 ) ] x + a( n - 2 ) ] x + a ( n - 3 ) ] + ... ]
The associated pseudo code is:
; ; ; y = a(n); for (i=1 to n) { y = y*x+a(n-i); } x = input. y = output. a(i) for i=0,1,...,n, are the coeffi cients.
Assuming that:
* x is a fractional and is located in R1; * a(i) are fractional; * y can be represented by a 16-bit data.
72 - TCH - 170 - 00
5/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
The fi nal result is contained in the ACCumulator (ACC).To minimize the loop overhead, the program uses "loop unrolling" and assumes that n is even. Memory
Low Addr. a(0) a(1) ... a(n-3) a(n-2) a(n-1) High Addr. a(n) R9
Figure 1 Memory Map
; ; Initialization. ; MOV MCW, MOV MRW, MOV R0, MOV R9, ; ; Initialize the Loop Count ; MOV R3 ; ; Loop Prolog ; CoMUL R1,
; ; Unrolled Loop ; SERIE_LOOP CoADD CoSTORE CoMUL R0, R2, R1, [R9-] MAS R2
#mcw #mrw #0 a(n)_address
; (MCW) mcw, MS and MP are cleared.
; (MRW) mrw. ; (R0) 0 ; (R9) address of a(n).
#n/2
; (R3) n/2.
[R9-]
; (ACC) a(n)*x; ; (R9) (R9)-2
; (ACC) (ACC) + a(i-1); ; (R9) (R9)-2. ; (R2) limited (ACC) ; (ACC) (R2)*x ; (R9) (R9)-2
6/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
CoADD CoSTORE CoMUL ; R0, R2, R1, [R9-] MAS R2
; (ACC) (ACC)+ a(i-2); ; (R9) (R9)-2. ; (R2) limited (ACC) ; (ACC) (R2)*x ; (R9) (R9)-2
; End_of_loop Checking. ; CMPD1 R3 JMPR cc_Z ; ; Loop Epilog ; CoADD R0,
#0h SERIE_LOOP
; (R3) (R3)-1. ; End-of-Loop test & branch.
[R9]
; (ACC) (ACC) + a(0); Program Words 28
Instruction Cycles Total 4N+8
72 - TCH - 170 - 00
7/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
2.3
[NxN][Nx1] matrix multiply
... ...
C1 C2 = .......
A11 A21 .......
A12 A22 .......
A13 A23 .......
A1N A2N X .......
B1 B2 ....... BN
CN
AN1
AN2
AN3
...
ANN
Figure 2 [NxN][Nx1] matrix multiply The [NxN][Nx1] matrix multiply memory map is shown below:
DPRAM Low Addr. A11 A12 ... A13 ... A1N ... ANN B1 B2 ... BN IDX0
DPRAM A11 A12 ... A13 ... A1N ... ANN B1 B2 ... BN R9
High Addr. Low Addr.
IDX0
R9
xx xx xx ... High Addr. xx Before
R10
C1 C2 C3 ... CN After R10
Figure 3 Memory map
8/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
N is assumed to be less than 31.
; ; MAC dedicated registers' initialization: ; EXTR ; MOV MOV ; ; ; ; ; MOV MOV MOV #2 IDX0, QR0, @A11 #N-1 ; 2 next instructions use ESFR ; space. ; (IDX0) A11_addr. ; (QR0) N-1.
; GPRs initialization: - R7 is used as loop counter. - R9 contains B1Address. - R10 contains C1 Address. R7, R9, R10, #N @B1 @C1 ; (R7) N ; (R9) B1_addr ; (R10)) C1_addr
MATRIX_LOOP: ; ; Dot Product prolog ; CoMUL [IDX0+], [R9+] ; (ACC) Ai1.B1 ; (IDX0) (IDX0)+2 ; (R9) (R9)+2. ; ; DOT Product loop. ;
REPEAT N-2 TIMES
CoMAC
[IDX0+],
[R9+]
; (ACC) (ACC) + Aij*Bj ; (IDX0) (IDX0)+2 ; (R9) (R9)+2.
; ; Dot Product epilog (provide Ci in an appropriate format). ; CoMAC [IDX0+], [R9-QR0] ; (ACC) (ACC) + AiN*Bn ; (IDX0) (IDX0)+2 ; (R9) (R9)-(N-1). ; ; Shift & Rounding
72 - TCH - 170 - 00
9/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; CoASHR ; ; CoSTORE #data3, rnd ; (ACC)=(ACC)>>#data3+rnd
; Write Ci into memory. [R10+] MAS ; ((R10)) Ci. ; (R10) (R10)+2. ; ; End_of_loop Checking. ; CMPD1 JMPR R7 cc_Z #0h MATRIX_LOOP ; (R7) (R7) -1. ; End-of-Loop test & branch
Instruction Cycles Total N2+4N+7
Program Words 24
2.4
N-real multiply (windowing)
for i =0,1,...,N-1
The formula is: y ( i ) = x ( i ) w ( i )
The memory mapping is shown in Figure 4.To minimize the loop overhead, this program uses "loop unrolling".The associated pseudo code is:
; ; ; for (i=0 to N-1) { y(i)= x(i)*w(i); } x(n) = input signal at time n. w(n) = window coeffi cient at time n.
10/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
DPRAM Low Addr. x(n-N+1) x(n-N+2) ... x(n-3) x(n-2) x(n-1) x(n) w(N-1) w(N-2) ... w(3) w(2) w(1) w(0) Before R9 IDX0
DPRAM y(n-N+1) y(n-N+2) ... y(n-3) y(n-2) y(n-1) y(n) w(N-1) w(N-2) ... w(3) w(2) w(1) w(0) After R9 IDX0
High Addr. Low Addr.
High Addr.
Figure 4 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized once for ever and L is a multiple of 4:
* R9 contains the w(N-1) address. * IDX0 contains the x(n-N+1) address * QX0 and QR0 with N-1
; ; Initialize the Loop Count ; MOV ; R3 #N/4
; (R3) N/4.
WINDOW_LOOP
; Unrolled Loop ; CoMUL [IDX0],
CoSTORE CoMUL [IDX0+], [IDX0],
[R9-] MAH [R9-]
rnd
rnd
; (ACC) w(i)*x(i) + rnd; ; (R9) (R9)+2 ; (R2) (ACC) ; (IDX0) (IDX0)+2 ; (ACC) w(i+1)*x(i+1)+rnd; ; (R9) (R9)+2
72 - TCH - 170 - 00
11/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
CoSTORE CoMUL CoSTORE CoMUL CoSTORE [IDX0+], [IDX0], [IDX0+], [IDX0], [IDX0+], MAH [R9-] MAH [R9-] MAH rnd rnd
; (R2) (ACC) ; (IDX0) (IDX0)+2 ; (ACC) w(i+2)*x(i+2)+rnd; ; (R9) (R9)+2 ; (R2) (ACC) ; (IDX0) (IDX0)+2 ; (ACC) w(i+3)*x(i+3)+rnd; ; (R9) (R9)+2 ; (R2) (ACC) ; (IDX0) (IDX0)+2
; ; End_of_loop Checking. ; CMPD1 JMPR R3 cc_Z
; (R3) (R3)-1. #0h WINDOW_LOOP ; End-of-Loop test & branch.
Instruction Cycles Total 2N + 2N/4+2
Program Words 5+ (2*2)*4
Note
The number of Instruction Cycles and Program Words required for this application depends on the "unrolling factor". "2N" corresponds to the number of cycles per coeffi cient, "2N/4" corresponds to the branch penalty when the "unrolling factor" is 4. Similarly, "(2*2)*4-4" corresponds to the increase in program words when the "unrolling factor" is 4. Typically, if URF defi nes the factor, the execution time and number of program words becomes: 2N+2N/URF + 2 instruction cycles, and 5+ 4*URF program words.
12/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
3
3.1
Fir Filter-Real Correlation-Convolution
Simple precision FIR filter
; ; ; ; ; y(n)=0; for (k=0 to L-1) { y(n)= y(n) + h(k)*x(n-k); } x(n) = input signal at time n. y(n) = output signal at time n. h(k) = k'th coeffi cient. L = Number of coeffi cient taps in the fi lter.
The pseudo code is:
This program illustrates the use of multiply/multiply-accumulate instructions, "CoMIN & CoMAX" (performing a programmable saturation), and a shift instruction. The corresponding
72 - TCH - 170 - 00
13/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
memory map is shown below. It is assumed that the coefficients and samples have been initialized by another routine.
DPRAM Low Addr. x(n-L+1) x(n-L+2) ... x(n-3) x(n-2) x(n-1) x(n-1) h(L-1) h(L-2) ... h(3) h(2) h(1) h(0) Before IDX0
DPRAM x(n-L+2) x(n-L+3) ... x(n-2) x(n-1) x(n) x(n) h(L-1) h(L-2) ... h(3) h(2) h(1) h(0) After IDX0
High Addr. Low Addr.
R9
R9
High Addr.
Figure 5 Memory Map This routine assumes that the following general purpose registers and co-processor registers (SFRs) have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the h(L-1) address * IDX0 contains the x(n-L+1) address * QX0 and QR0 with L-1
; ; Repeat Count Initialization (repeat count > 31) ; MOV ; MRW, #L-4
; (MRW) L-4.
; Read the new fi lter input from a (E)SFR and move it into the DPRAM at x(n-1)
14/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; address overwriting thus x(n-1) ; MOV ; ; FIR prolog: fi rst multiplication ; CoMUL [IDX0+], [R9+] @x(n), ADC_sfr
; move the new input x(n)
; (ACC) h(L-1)*x(n-L+1) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
;
; REPEAT MRW TIMES
; FIR loop: Repeat L-2 times the same MAC instruction. CoMACM [IDX0+], [R9+] ; (ACC) (ACC) + h(i)*x(n-i)
; & x(n-i-1) x(n-i), ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
; ; FIR epilog: last MAC instruction and provide y(n) in an appropriate format ; CoMACM [IDX0-QX0], [R9-QR0]
; (ACC) (ACC)+h(0)*x(n) ; & x(n-l+1) x(n-L+2), ; (IDX0) (IDX0)-2*(L-1), ; (R9) (R9)-2*(L-1).
; ; Shift & Rounding ; CoASHR ; ; Limiting ; CoMIN CoMAX ; ; NOP MOV DAC_sfr, MAH R0, R0, R1 R2 #data3, rnd
; (ACC) (ACC)>>a #data3 ;+rnd
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
;Write the new fi lter output y(n) into a (E)SFR.
; Pipeline Effect. ; move the new output y(n).
72 - TCH - 170 - 00
15/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Instruction Cycles Read Input sample Initialization FIR Loop Post -Processing Write Output sample Total 1 1 L 4 1 L+7
Program Words 2 2 6 7 2 19
3.2
Extended-precision FIR filter
This routine describes a FIR fi lter using a 32-bit coefficient and a 16-bit input sample.
* The extended-precision FIR filter uses the same naming convention as the
single-precision FIR filter.
* hL(k) and hH(k) stand for the least significant (LS) and most signifi cant (MS) word of the
k'th coeffi cient, respectively.
* For simplicity, MP from MSC should be cleared.
16/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Figure 6 shows the memory map. It is assumed that both coefficients and samples have been initialized by another routine. This program illustrates the use of signed/unsigned multiplications.
DPRAM Low Addr. x(n-L+1) x(n-L+2) ... x(n-3) x(n-2) x(n-1) x(n-1) hL(L-1) hH(L-1) hL(L-2) hH(L-2) ... hL(2) hH (2) hL(1) hH (1) hL(0) hH (0) Before IDX0
DPRAM x(n-L+2) x(n-L+3) ... x(n-2) x(n-1) x(n) x(n) hL(L-1) hH(L-1) hL(L-2) hH(L-2) ... hL(2) hH(2) hL(1) hH(1) hL(0) hH(0) After IDX0
High Addr. Low Addr.
R9 R10
R9 R10
High Addr.
Figure 6 Memory map This routines assumes that the following general purpose registers and co-processor registers (SFRs) have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the hL(L-1) address * R10 contains the hH(L-1) address
72 - TCH - 170 - 00
17/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS * IDX0 contains the x(n-L+1) address * QX0 with L-1 * QR0 with 2 * QR1 with 2*L-1
; ; Repeat Count Initialization (repeat count > 31) ; MOV ; MRW, #L-4
; (MRW) L-4.
; Read the new filter input from a (E)SFR and move it into the DPRAM ; at x(n-1) address therefore overwriting x(n-1). ; ; move the new input x(n) MOV @x(n), ADC_sfr ; ; FIR prolog (LSWs of Impulse response): fi rst multiplication ; CoMULus [IDX0+], [R9+QR0] ; (ACC) hL(L-1)*x(n-L+1) ; (IDX0) (IDX0)+2, ; (R9) (R9)+4.
;
REPEAT MRW TIMES
; FIR loop (LSWs of Impulse response) Repeat the same MAC instruction L-2 times ; CoMACsu [IDX0+], [R9+QR0] ; (ACC) (ACC) + hL(i)*x(n-i) ; (IDX0) (IDX0)+2, ; (R9) (R9)+4.
;
; FIR epilog (LSWs of Impulse response): last MAC instruction and provide y(n) ; in an appropriate format ; CoMACsu [IDX0-QX1], [R9-QR1] ; (ACC) (ACC) + hL(0)*x(n) ; & x(n-l+1) x(n-L+2), ; (IDX0) (IDX0)-2*(L-1), ; (R9) (R9)-2*(2L-1).
; ; Rounding & Shift ; MOV CoRND MRW, #L-4
; (MRW)=L-4. ; (ACC)=(ACC)+rnd
18/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
CoASHR CoASHR 8, 8, ;
; (ACC)=(ACC)>>8 ; (ACC)=(ACC)>>8
; FIR prolog (MSWs of Impulse response): fi rst multiplication ; CoMAC `[IDX0+], [R10+QR0] ; (ACC) hH (L-1)*x(n-L+1) ; (IDX0) (IDX0)+2, ; (R10) (R10)+4.
;
REPEAT MRW TIMES
; FIR loop (MSWs of Impulse response)Repeat the same MAC instruction L-2 times ; CoMACM [IDX0+], [R10+QR0] ; (ACC) (ACC) + hH(i)*x(n-i) ; & x(n-i-1) x(n-i), ; (IDX0) (IDX0)+2, ; (R10) (R10)+4.
;
; FIR epilog (MSWs of Impulse response): last MAC instruction and provide ; y(n) in an appropriate format ; CoMACM [IDX0-QX1], [R10-QR1] ; (ACC) (ACC) + hH(0)*x(n) ; & x(n-l+1) x(n-L+2), ; (IDX0) (IDX0)-2*(L-1), ; (R10) (R10)-2*(2L-1).
; ; Shift & Rounding ; CoASHR ; #data3, rnd
; (ACC) (ACC)>>a #data3 +rnd
; Limiting ; CoMIN R0, R1 ; (ACC) Min((ACC), MAX). CoMAX R0, R2 ; (ACC) Max((ACC),MIN). ; ;Write the new filter output y(n) into a (E)SFR. ; NOP ; Pipeline Effect. MOV DAC_sfr, MAH ; move the new output y(n).
72 - TCH - 170 - 00
19/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Instruction Cycles Read Input sample Initialization FIR Loop Post -Processing Write Output sample Total 1 2 2L+3 4 1 2L+11
Program Words 2 4 18 7 2 33
20/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
4
4.1
IIR Filters
Nth Order IIR filter: direct form 1
The rules for the implementation of FIR filters can be extended to IIR fi lters. The Nth-order difference equation is:
y( n ) =
k=1
N
a(k) y(n - k) +
k=0
M
b(k) x( n - k)
This can be called "Direct Form 1". The associated pseudo code is:
; ; ; ; x(n) = input signal at time n y(n) = output signal at time n a(k), b(k)= IIR coeffi cients N, M refer to the above equation
y(n)=0; for (k=0 to M) { y(n)= y(n) +b(k)*x(n-k) } for (k=1 to N) { y(n)= y(n) +a(k)*y(n-k); }
72 - TCH - 170 - 00
21/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Figure 7 shows the memory map. It has been assumed that the coefficients and samples have been initialized by another routine.
DPRAM Low Addr. x(n-M) x(n-M+1) ... x(n-3) x(n-2) x(n-1) x(n-1) y(n-N) y(n-N+1) ... y(n-3) y(n-2) y(n-1) b(M) b(M-1) ... b(3) b(2) b(1) b(0) a(N) a(N-1) ... a(3) a(2) a(1) Before IDX0 DPRAM x(n-M+1) x(n-M+2) ... x(n-2) x(n-1) x(n) x(n) y(n-N+1) y(n-N+2) ... y(n-2) y(n-2) y(n) b(M) b(M-1) ... b(3) b(2) b(1) b(0) a(N) a(N-1) ... a(3) a(2) a(1) After IDX0
High Addr. Low Addr.
R10 R9
R9
High Addr.
Figure 7 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value
22/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS * R9 contains the b(M) address * R10 contains the y(n) address * IDX0 contains the x(n-M) address * QX0 with N+M * QR0 with N+M
; ; Repeat Count Initialization (repeat count > 31) for the fi rst IIR Loop ; MOV ; MRW, #M-2
; (MRW) M-2.
REPEAT MRW TIMES
; Read the new filter input from a (E)SFR & move it into the DPRAM ; at x(n-1) address, overwriting x(n-1). ; MOV @x(n), ADC_sfr ; move the new input x(n) ; ; Prolog of the First IIR loop. ; CoMUL `[IDX0+], [R9+] ; (ACC) b(M)*x(n-M) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; ; First IIR loop. ; CoMACM [IDX0+], [R9+] ; (ACC) (ACC)+b(i)*x(n-i) ; & x(n-i-1) x(n-i), ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; ; Repeat Count Initialization (repeat count > 31) for the second. ; MOV MRW, #N-43 ; (MRW) N-4. ; ; ; prolog of the Second IIR loop. ; CoMAC [IDX0+], [R9+] ; (ACC) a(N)*y(n-N) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
72 - TCH - 170 - 00
23/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
;
REPEAT MRW TIMES
; Second IIR loop. ; CoMACM [IDX0+],
[R9+]
; (ACC) (ACC) + a(i)*y(n-i) ; & y(n-i-1) y(n-i), ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
; ; Epilog of the second IIR loop. ; CoMACM [IDX0-QX0], [R9-QR0]
; (ACC) (ACC)+h(0)*x(n) ; & y(n-2) y(n-1), ; (IDX0) (IDX0)-2*(N+M), ; (R9) (R9)-2*(N+M).
;
; Rounding ; CoRND ; ; Limiting ; CoMIN R0, R1 CoMAX R0, R2 ; ; Write the new filter output, y(n), into memory. ; CoSTORE [R10] MAH ; ;Write the new fi lter output y(n) into a (E)SFR. ; NOP MOV DAC_sfr, MAH
; (ACC) (ACC) + rnd
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
; ((R10)) y(n).
; Pipeline Effect. ; move the new output y(n).
Instruction Cycles Read Input sample Initialization DF1 IIR Loop Post -Processing 1 2 N+M 5
Program Words 2 4 10 9
24/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Instruction Cycles Write Output sample Total 1 N+M+9
Program Words 2 27
4.2
Nth Order IIR filter: direct form 2
The following equations equally represent the Nth Order IIR filter:
u ( n ) = x ( n) +
k=1 N
N
a(k) u(n - k)
u(n) =
b(k) u(n - k)
k=0
These equations use the intermediate state variable vector U={u(n), u(n-1), u(n-2),..., u(n-N)}. This representation is called "Direct Form 2" and is illustrated by Figure 8. Direct Form 2 has an advantage over Direct Form 1 as it requires less data memory.
72 - TCH - 170 - 00
25/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
The associated pseudo code is:
; ; ; ; ; ; u(n)=x(n); for (k=1 to N) { u(n)= u(n) +a(k)*u(n-k); } y(n)=b(0)*u(n); for (k=1 to N) { y(n)= y(n) +b(k)*u(n-k); } x(n) = input signal at time n. u(n) = state variable at time n. y(n) = output signal at time n. a(k), b(k)= IIR coeffi cients. It is assumed N = M.
u(n) x(n) Z-1 a(1) u(n-1)
b(0)
b(1)
Z-1 a(2) u(n-2) b(2)
a(N-1)
u(n-N+1)
b(N-1)
Z-1 a(N) u(n-N) b(N)
Figure 8 IIR Direct Form 2
26/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Figure 9 shows the corresponding memory map. It has been assumed that the coefficients and samples have been initialized by another routine.
DPRAM Low Addr. u(n-N) u(n-N+1) ... u(n-3) u(n-2) u(n-1) u(n-1) a(N) a(N-1) ... a(3) a(2) a(1) b(N) b(N-1) ... b(3) b(2) b(1) b(0) Before IDX0 DPRAM u(n-N+1) u(n-N+2) ... u(n-2) u(n-1) u(n) u(n) a(N) a(N-1) ... a(3) a(2) a(1) b(N) b(N-1) ... b(3) b(2) b(1) b(0) After IDX0
High Addr. Low Addr.
R10 R9
R9
High Addr.
Figure 9 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the a(N) address * R10 contains the y(n) address * IDX0 contains the u(n-N) address * QX0 with N-1 and QX1 with N * QR0 with 2N
72 - TCH - 170 - 00
27/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; ; Repeat Count Initialization (repeat count > 31) for the fi rst IIR Loop ; MOV ; MRW, #N-3
; (MRW) N-3.
; Read the new filter input from a (E)SFR and move it into the Accumulator. ; MOV MAH, ADC_sfr ; (MAH) x(n), ; (MAE) 8 times (MAH15), ; (MAL) 0000H.
;
REPEAT MRW TIMES
; First IIR loop. ; CoMAC
[IDX0+],
[R9+]
; (ACC) (ACC) + a(i)*u(n-i) ; (IDX0) (IDX0) + 2, ; (R9) (R9)+2.
;
; Epilog of the fi rst IIR loop. ; CoMAC [IDX0-QX0],
[R9+], rnd
; ; ; ;
(ACC) (ACC)+a(1)*u(n-1) +rnd (IDX0) (IDX0)-2*(N-1), (R9) (R9)+2.
; ; Repeat Count Initialization (repeat count > 31) for the second. ; MOV MRW, #N-4
; (MRW) N-4.
; ; Move u(n) into memory. ; CoSTORE ; [R10], MAS
; ((R10)) u(n)
; Prolog of the Second IIR loop. ; CoMAC `
[IDX0+],[R9+]
; (ACC) b(N)*u(n-N) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
;
28/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; Second IIR loop. ; CoMACM [IDX0+],
REPEAT MRW TIMES
[R9+]
; ; ; ;
(ACC) (ACC)+b(i)*u(n-i) & u(n-i-1) u(n-i), (IDX0) (IDX0)+2, (R9) (R9)+2.
;
; Epilog of the Second IIR loop. ; CoMACM [IDX0-QX1],
[R9-QR0], rnd
; ; ; ;
(ACC) b(0)*u(n)+rnd & u(n-1) u(n), (IDX0) (IDX0)-2N, (R9) (R9)-2N.
; ; Limiting ; CoMIN CoMAX ; ;Write the new fi lter output y(n) into a (E)SFR. ; NOP MOV DAC_sfr, MAH R0, R0, R1 R2
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
; Pipeline Effect. ; move the new output y(n).
Instruction Cycles Read Input sample Initialization DF2 IIR Loop Post -Processing Write Output sample Total 1 2 2N+2 2 1 2N+8
Program Words 2 4 14 3 2 25
72 - TCH - 170 - 00
29/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
4.3
N-cascaded real biquads (direct form 2)
A high-order fi lter can be implemented, either as a single section, or as a combination of fi rst and second order sections. The single section form is quicker and easier to implement, but generates a larger numerical error. This increased error occurs for two reasons:
* The long fi lter computation process accumulates errors from multiplication with
quantized coefficients.
* The roots of high-order polynomials are increasingly sensitive to changes in their
quantized coefficients. Therefore, the single section form is not recommended except for a very low order controller. (see "Nth Order IIR filter: direct form 1" on page 21) To implement a high-order transfer function, first decompose it into first order and second order blocks (biquads), and then connect these blocks in a cascade. The following paragraphs illustrate this technique for an even numbers of cascaded biquads. Unlike conventional digital signal processors, the MAC co-processor is able to repeat a single instruction at high speed but does not offer flexible and fast hardware looping. Consequently, to perform a loop containing more than one instruction the programmer must use the regular instruction set incurring a several cycle penalty for the end-of-loop detection. "Loop Unrolling" minimizes this penalty but increases the number of instructions. In the following section the loop unrolling technique will not be employed. Equations of a Direct Form 2 Nth Order IIR fi lter applied to a second order filter (N=2) yield:
ui ( n ) = (yi(n) =
x i ( n ) - ai ( 1 ) u i ( n - 1 ) - ai ( 2 ) u i ( n - 2 ) b i ( 0 ) u i ( n ) + b i ( 1 ) u i ( n - 1 ) + b i ( 2 ) u i ( n - 2 ) )x
Where "i" specifies the biquad number. Note that yi(n)=xi+1(n). For simplicity, it has been assumed that no overflow occurs on ui(n) and yi(n). The naming convention is: ; xi(n) = input signal at time n of biquad number i. ; ui(n) = state variable at time n of biquad number i. ; y(n) = output signal at time n of biquad number i. ; ai(k), bi(k)= Coeffi cients of biquad number i.
30/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Figure 10 shows the corresponding memory map and assumes that both coefficients and samples have been initialized by another routine.
DPRAM Low Addr. u1(n-2) u1(n-1) u2(n-2) u2(n-1) ... uN(n-2) uN(n-1) a1(2) a1(1) b1(2) b1(1) b1(0) ... aN(2) aN(1) bN(2) bN(1) bN(0) Before R9 IDX0 IDX1 DPRAM u1(n-1) u1(n) u2(n-1) u2(n) ... uN(n-1) uN(n) a1(2) a1(1) b1(2) b1(1) b1(0) ... aN(2) aN(1) bN(2) bN(1) bN(0) After R9 IDX0 IDX1
High Addr. Low Addr.
High Addr.
Figure 10 Memory map This routines assumes that the following general purpose and co-processor registers have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the a1(2) address * R10 contains the R5 physical address * IDX0 contains the u1(n-2) address * IDX1 contains the u1(n-1) address * QX0 with 2 * QX1 with 2N-1 * QR0 with 5N-1.
72 - TCH - 170 - 00
31/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; ; Initialize the Loop Count: N ; MOV ; R3 #N-1
; (R3) N-1.
; Read the new fi lter input from a (E)SFR and move it into the Accumulator. ; MOV MAH, ADC_sfr ; (MAH) x(n), ; (MAE) 8 times (MAH15), ; (MAL) 0000H . DF2_BIQUAD_LOOP
;
; First Biquad iteration ; CoMAC[IDX0+],
[R9+]
; (ACC) (ACC)-ai(2)*ui(n-2) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; (ACC) (ACC)-ai(2)*ui(n-1) ; (IDX0) (IDX0)-2, ; (R9) (R9)+2. ; (ACC) (ACC)+rnd
CoMAC-
[IDX0-],
[R9+]
CoRND ;
; Write ui(n), into a GPR (R5) ; CoSTORE R5, MAS
;
; (R5) ui(n).
; Second Biquad iteration. ; CoMAC [IDX0+],
[R9+]
; (ACC) (ACC) + bi(2)*ui(n-2) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; (ACC) (ACC)+bi(1)*ui(n-1) ; & ui(n-2) ui(n-1) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2.
CoMACM
[IDX0+],
[R9+]
CoMAC
R5,
[R9+]
; (ACC) (ACC) + bi(0)*ui(n) ; (R9) (R9)+2.
32/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
;
; Write ui(n) to memory. ; CoMOV [IDX1+QX0] [R10]
; ui(n-1) ui(n). ; (IDX1) (IDX1)+4,
;
; End_of_loop Checking. ; CMPD1 R3 #0h ; (R3) (R3)-1. JMPR cc_Z DF2_BIQUAD_LOOP ; End-of-Loop test & branch. ; ; First iteration of the last biquad ; CoMAC[IDX0+], [R9+] ; (ACC) (ACC)-ai(2)*ui(n-2) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. CoMAC[IDX0-], [R9+] ; (ACC) (ACC)-ai(2)*ui(n-1) ; (IDX0) (IDX0)-2, ; (R9) (R9)+2. ; Note that CoMAC- and CoRND cannot be combined. CoRND ; (ACC) (ACC)+rnd ; ; Write uN(n), into a GPR (R5) ; CoSTORE R5, MAS
; ; Second iteration of the last biquad. ; CoMAC [IDX0+], [R9+]
; (R5) ui(n).
; (ACC) (ACC) + bN(2)*uN(n-2) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; (ACC) (ACC)+bN(1)*uN(n-1) ; & uN(n-2) uN(n-1) ; (IDX0) (IDX0)-2*(2N-1), ; (R9) (R9)+2.
CoMACM
[IDX0-QX1],
[R9+]
CoMAC
R5,
[R9-QR0]
rnd
; (ACC) (ACC) + bN(0)*uN(n) ; +rnd
72 - TCH - 170 - 00
33/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; (R9) (R9)-2*(5N-1).
;
; Write ui(n) to memory. ; CoMOV [IDX1-QX1]
;
[R10]
; uN(n-1) uN(n). ; (IDX1) (IDX1)-2*(2N-1),
; Limiting ; CoMIN R0, R1 CoMAX R0, R2 ; ;Write the new fi lter output y(n) into a (E)SFR. ; NOP MOV DAC_sfr, MAH
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
; Pipeline Effect. ; move the new output y(n).
Instruction Cycles Read Input sample Initialization DF2 Biquad Loop Post -Processing Write Output sample Total 1 1 10N-1 3 1 10N+5
Program Words 2 2 19 5 2 31
4.4
N-cascaded real biquads: transpose form
The equations of a Direct Form 2 Nth Order IIR fi lter applied to a second order filter (N=2) can yield:
yi ( n ) = b i ( 0 ) x i ( n ) + ui ( n - 1 ) ( u i ( n ) = b i ( 1 ) x i ( n - 1 ) - a i ( 1 ) y i ( n ) + w i ( n - 1 ) )x w i ( n ) = b i ( 2 ) x i ( n ) - a i ( 2 ) yi ( n )
Where "i" is the biquad number. Note that yi(n)=xi+1(n). This form is also called the "Transpose Form".
34/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
For simplicity, it has been assumed that no overflow occurs on ui(n) or yi(n). This form is suitable when the input-to-output delay must be minimized. The naming convention is:
* ; xi(n) = input signal at time n of biquad number i. * ; ui(n), wi(n) = state variables at time n of biquad number i. * ; yi(n) = output signal at time n of biquad number i. * ; ai(k), bi(k)= Coeffi cients of biquad number i.
Figure 11 shows the corresponding memory map. It is assumed that both coefficients and samples have been initialized by another routine.
DPRAM Low Addr. u1(n-1) w1(n-1) u2(n-1) w2(n-1) ... uN(n-1) wN(n-1) b1(0) a1(1) b1(1) a1(2) b1(2) ... bN(0) aN(1) bN(1) aN(2) bN(2) Before R9 R4 DPRAM u1(n) w1(n) u2(n) w2(n) ... uN(n) wN(n) b1(0) a1(1) b1(1) a1(2) b1(2) ... bN(0) aN(1) bN(1) aN(2) bN(2) After R9 R4
High Addr. Low Addr.
High Addr.
Figure 11 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized:
* R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value
35/53
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS * R4 contains the u1(n-1) address * R9 contains the b1(0) address * R10 contains the R5 physical address * QR0 with 2N-1 * QR1 with 5N-1
; ; Initialize the Loop Count: N ; MOV ; R3 #N-1
; (R3) N-1.
; Read the new filter input from a (E)SFR and move it into a GPR (R5). ; MOV R5, ADC_sfr ; (R5) x(n) ; TF_BIQUAD_LOOP: ; ; Compute yi(n) ; CoLOAD [R4+],
CoMAC [R9+],
R0 R5 rnd
; (ACC) ui(n-1) ; (R4) (R4)+2. ; (ACC) (ACC)+bi(0)*xi(n) ; +rnd ; (R9) (R9)+2.
;
; Write yi(n) into R8. ; CoSTORE R8,
;
MAS
; (R8) limited(yi(n)).
; Compute ui(n) ; CoLOAD [R4],
CoMACCoMAC ; [R9+], [R9+],
R0 R8 R5 rnd
; (ACC) wi(n-1) ; (ACC) (ACC)-ai(1)*yi(n)+ ; (R9) (R9)+2. ; (ACC) (ACC)+bi(1)*xi(n)+rnd ; (R9) (R9)+2.
36/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; Write ui(n) into memory. ; CoSTORE [R4+],
;
MAS
; ui(n-1) ui(n). ; (R4) (R4)+2.
; Compute wi(n) ; CoMUL[R9+],
CoMAC ; [R9+],
R8 R5 rnd
; (ACC) -ai(2)*yi(n) ; (R9) (R9)+2. ; (ACC) (ACC)+bi(2)*xi(n)+rnd ; (R9) (R9)+2.
; Write wi(n) into memory. ; CoSTORE [R4+],
;
MAS
; wi(n-1) wi(n). ; (R4) (R4)+2.
; Write yi(n) into R5. ; MOV R5
;
R8
; xi+1(n) yi(n).
; End_of_loop Checking. ; CMPD1 R3 JMPR cc_Z ; ; Compute yN(n) ; CoLOAD [R4+],
CoMAC [R9+],
#0h
; (R3) (R3) -1. DF2_BIQUAD_LOOP ; End-of-Loop test & branch.
R0 R5 rnd
; (ACC) uN(n-1) ; (R4) (R4)+2. ; (ACC) (ACC)+bN(0)*xN(n) ; +rnd ; (R9) (R9)+2.
;
; Write yN(n) into R8. ; CoSTORE R8,
MAS
; (R8) limited(y(n)).
72 - TCH - 170 - 00
37/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
;
; Limiting ; CoMIN R0, R1 CoMAX R0, R2 ; ;Write the new fi lter output y(n) into a (E)SFR. ; CoLOAD [R4], R0
MOV ; DAC_sfr, MAH
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
; (ACC) wN(n-1) ; move the new output y(n).
; Compute ui(n) ; CoMAC[R9+],
CoMAC [R9+],
R8 R5 rnd
; (ACC) (ACC)-aN(1)*yN(n)+ ; (R9) (R9)+2. ; (ACC) (ACC)+bN(1)*xN(n) ; +rnd ; (R9) (R9)+2.
;
; Write ui(n) into memory. ; CoSTORE [R4+],
;
MAS
; uN(n-1) uN(n). ; (R4) (R4)+2.
; Compute wi(n) ; CoMUL[R9+],
CoMAC ; [R9-QR1],
R8 R5 rnd
; (ACC) -aN(2)*yN(n) ; (R9) (R9)+2. ; (ACC) (ACC)+bi(2)*xi(n)+rnd ; (R9) (R9)-2*(5N-1).
; Write wi(n) into memory. ; CoSTORE [R4-QR0],
MAS
; wN(n-1) wN(n). ; (R4) (R4)-2*(2N-1).
38/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Instruction Cycles Read Input sample Initialization TF Biquad Loop Post -Processing Write Output sample Total 1 1 13N-1 2 1 13N+4
Program Words 2 2 44 4 2 54
72 - TCH - 170 - 00
39/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
5
5.1
LMS Adaptive Filter
Single-precision LMS adaptive filter
An adaptive fi lter contains coeffi cients that are updated by an adaptive algorithm to optimize the fi lter'sresponse to a desired performance criterion. In general, adaptive fi lters have two distinct parts: a fi lter whose structure is designed to perform a processing function, and an adaptive algorithm for adjusting the coefficients of that fi lter to improve its performance. The incoming signal x(n) is weighted in a digital filter to produce an output y(n). The adaptive algorithm adjusts the filter weights to minimize the error e(n) between the fi lter output y(n) and the desired response of the filter d(n). The Single-Precision LMS Adaptive Filter is a FIR filter whose coefficients are updated at each iteration according to an error signal e(n) equal to d(n)-y(n), where d(n) is the desired signal at time n and y(n) is the FIR output. Figure 12 illustrates this filter.
x(n) Unit Delay
x(n-1) Unit Delay
x(n-2) Unit Delay
x(n-L+1)
h(0,n)
h(1,n)
h(2,n)
h(L-1,n)
+ y(n)
d(n)
e(n) +
Figure 12 LMS Adaptive Filter The corresponding pseudo code is:
; ; ; ; ; ; x(n) = input signal at time n. d(n) = desired signal at time n. y(n) = output signal at time n. h(k, n) = k'th coeffi cient at time n. Mu= adaptive gain. L = Number of coeffi cient taps in the fi lter.
40/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; y(n)=0; for (k=0 to L-1) { y(n)= y(n) + h(k,n)*x(n-k); } e(n)=d(n)-y(n); for (k=0 to L-1) { h(k,n+1)= h(k,n) - Mu*x(n-k)*e(n); }
Figure 13 shows the corresponding memory map. It has been assumed that both the coeffi cients and samples have been initialized by another routine. Unlike pure DSP filters, this fi lter is implemented in two steps, FIR output computation is followed by an update of the coefficients.
DPRAM Low Addr. x(n-L+1) x(n-L+2) ... x(n-3) x(n-2) x(n-1) x(n-1) IDX0 R10
DPRAM x(n-L+2) x(n-L+3) ... x(n-2) x(n-1) x(n) x(n) IDX0 R10
High Addr. Low Addr.
h(L-1, n) h(L-2,n) ... h(3, n) h(2, n) h(1, n) h(0, n)
R9
h(L-1,n+1) h(L-2,n+1) ... h(3, h(2, h(1, h(0, n+1) n+1) n+1) n+1)
R9
High Addr.
Figure 13 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized once for ever and that L is less than 31:
72 - TCH - 170 - 00
41/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS * R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the h(L-1,n) address * R10 contains the x(n-L+1) address * IDX0 contains the x(n-L+1) address * QX0 and QR0 with L-1
;
REPEAT L-3 TIMES
; Read the new filter input from a (E)SFR and move it into the DPRAM ; at x(n-1) address overwriting therefore x(n-1). ; MOV @x(n), ADC_sfr ; move the new input x(n) ; ; FIR prolog: fi rst multiplication ; CoMUL [IDX0+], [R9+] ; (ACC) h(L-1)*x(n-L+1) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; ; FIR loop: Repeat L-2 times the same MAC instruction. ; CoMAC [IDX0+], [R9+] ; (ACC) (ACC)+h(i)*x(n-i) ; (IDX0) (IDX0)+2, ; (R9) (R9)+2. ; ; FIR epilog: last MAC instruction and provide y(n) in an appropriate format ; CoMAC [IDX0-QX0], [R9-QR0] ; (ACC) (ACC)+h(0)*x(n) ; (IDX0) (IDX0)-2*(L-1), ; (R9) (R9)-2*(L-1). ; ; Shift & Rounding ; CoASHR #data3, rnd ; (ACC) (ACC)>>a #data3 ;+rnd
42/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
;
; Limiting ; CoMIN R0, R1 ; (ACC) Min((ACC), MAX). CoMAX R0, R2 ; (ACC) Max((ACC),MIN). ; ;Write the new fi lter output y(n) into an (E)SFR. ; NOP ; Pipeline Effect. MOV DAC_sfr, MAH ; move the new output y(n). ; ;Read d(n) and move it into a GPR. ; MOV R5, @d(n) ; (R15) d(n) ; ; Error, e(n), Calculation ; SUB R5, R6 ; (R5) d(n)-y(n)=e(n) MOV @e(n), R5 ; e(n-1) e(n) CoMUL R5, R8 ; (ACC) Mu*e(n) CoNEG rnd ; (ACC) -(ACC)+rnd CoSTORE R11, MAS ; (R11) -Mu*e(n). ; ; Coeffi cients' Updating ; MOV R3, #L-2 ; (R3) L ; ; Coeffi cient Update Prolog. ; CoLOAD [R9], R0 ; (ACC) h(L-1,n) CoMAC R11 [R10+], rnd ; (ACC) h(L-1,n) ; Mu.e(n)*x(n-L+1)+rnd ; (R10) (R10)+2. CoSTORE [R9+], MAS ; h(L-1,n) h(L-1,n+1). ; (R9) (R9)+2. ; ; Coeffi cient Update Loop. ;
LMS_LOOP: ;
72 - TCH - 170 - 00
43/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
CoLOAD CoMACM [R9], R11
CoSTORE
[R9+],
; (ACC) h(k,n) [R10+], rnd ; (ACC) h(k,n) -Mu.e(n)*x(n-k) ;+rnd ; x(n-k-1) x(n-k). ; (R10) (R10)+2. MAS ; h(k,n) h(k,n+1). ; (R9) (R9)+2.
R0
; ; End_of_loop Checking. ; CMPD1 JMPR ; ; Coeffi cient Update epilog. ; CoLOAD CoMACM [R9], [R10-QR0], R0 R11 rnd R3 cc_Z
; (R3) (R3) -1. #0h LMS_LOOP ; End-of-Loop test & branch.
CoSTORE
[R9-QR0],
MAS
; (ACC) h(0,n) ; (ACC) h(0,n) - Mu.e(n)*x(n)+ ; rnd ; x(n-1) x(n). ; (R10) (R10)-2*(L-1). ; h(0,n) h(0,n+1). ; (R9) (R9)-2*(L-1).
Program Words 4 2 25 16 4 51
Instruction Cycles Read Input samples Initialization LMS Loop Post/Pre -Processing Write Output sample Total 2 1 4L+2(L-2)+1 9 2 4L+2(L-2)+15
Note
The branch penalty in the LMS loop is roughly one third of the execution time of the LMS loop. Nevertheless, as shown in "N-real multiply (windowing)" on page 10, it is possible to minimize the branch penalty by "unrolling" instructions. Therefore, if URF is the UnRolling Factor, the execution times and program words count become respectively: 4L+2(L-2)/URF+1 instruction cycles, and 51 +(URF-1)*2 Program words.
44/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
5.2
Extended-precision LMS adaptive filter
16-bit coeffi cients can be insuffi cient for LMS fi ltering. The following routine describes a LMS fi lter with 32-bit coefficients and 16-bit samples. In most applications 24-bit coefficients provide good results. The Extended-precision LMS Adaptive fi lter uses the same naming convention as the single-precision LMS Adaptive fi lter: hL(k,n) and hH(k,n) represent the LS word and MS word (respectively) of the k'th coefficient at time n. For simplicity, you are advised to clear MP of MSC. Figure 14 shows the corresponding memory map. It is assumed that both coefficients and samples have been initialized by another routine. Note that unlike the "Single-precision LMS adaptive fi lter" on page 40, this loop is not "unrolled".
DPRAM Low Addr. x(n-L+1) x(n-L+2) ... x(n-3) x(n-2) x(n-1) x(n-1) High Addr. Low Addr. hL(L-1) hH(L-1) hL(L-2) hH(L-2) ... hL(2) hH(2) hL(1) hH(1) hL(0) hH(0) Before IDX0
DPRAM x(n-L+2) x(n-L+3) ... x(n-2) x(n-1) x(n) x(n) hL(L-1) hH(L-1) hL(L-2) hH(L-2) ... hL(2) hH(2) hL(1) hH(1) hL(0) hH(0) After IDX0
R9,IDX1 R10
R9,IDX1 R10
High Addr.
Figure 14 Memory map This routines assumes that the following general purpose and co-processor registers (SFRs) have been initialized once for ever and that L is less than 31:
72 - TCH - 170 - 00
45/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS * R0 with 0000H * R1 with the 16-bit MAXimum tolerated value * R2 contains the 16-bit MINimum tolerated value * R9 contains the hL(L-1) address * R10 contains the hH(L-1) address * IDX0 contains the x(n-L+1) address * QX0 with L-1 * QX1 and QR0 with 2 * QR1 with 2*L-1
;
; Read the new fi lter input from a (E)SFR and move it into the DPRAM at ; x(n-1) address therefore overwriting x(n-1). ; MOV @x(n), ADC_sfr ; move the new input x(n) ; ; FIR prolog (LSWs of Impulse response): fi rst multiplication ; CoMULsu [IDX0+], [R9+QR0] ; (ACC) hL(L-1)*x(n-L+1) ; (IDX0) (IDX0)+2, ; (R9) (R9)+4.
; ; FIR loop (LSWs of Impulse response): Repeat L-2 times the same MAC instruction. ;
REPEAT L-3 TIMES
CoMACsu
[IDX0+],
[R9+QR0]
; (ACC) (ACC)+hL(i)*x(n-i) ; (IDX0) (IDX0)+2, ; (R9) (R9)+4.
; ; FIR epilog (LSWs of Impulse response): last MAC instruction and provide ; y(n) in an appropriate format ; CoMACsu [IDX0-QX1], [R9-QR1], rnd ; (ACC) (ACC)+hL(0)*x(n)+rnd
; & x(n-l+1) x(n-L+2), ; (IDX0) (IDX0)-2*(L-1), ; (R9) (R9)-2*(2L-1).
;
46/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; Shift ; CoASHR 8, ; (ACC)=(ACC)>>8 CoASHR 8, ; (ACC)=(ACC)>>8 ; ; FIR prolog (MSWs of Impulse response): fi rst multiplication ; ; (ACC) hH(L-1)*x(n-L+1) CoMAC [IDX0+], [R10+QR0] ; (IDX0) (IDX0)+2, ; (R10) (R10)+4.
; ; FIR loop (MSWs of Impulse response): Repeat L-2 times the same MAC instruction. ;
REPEAT L-3 TIMES
CoMAC
[IDX0+],
[R10+QR0]
; (ACC) (ACC)+hH(i)*x(n-i) ; & x(n-i-1) x(n-i), ; (IDX0) (IDX0)+2, ; (R10) (R10)+4.
; ; FIR epilog (MSWs of Impulse response): last MAC instruction and provide ; y(n) in an appropriate format ; CoMAC [IDX0-QX1], [R10-QR1]
; (ACC) (ACC)+hH(0)*x(n) ; & x(n-l+1) x(n-L+2), ; (IDX0) (IDX0)-2*(L-1), ; (R10) (R10)-2*(2L-1).
;
; Shift & Rounding ; CoASHR #data3,
;
rnd
; (ACC) (ACC)>>a #data3 ;+rnd
; Limiting ; CoMIN R0, R1 CoMAX R0, R2 ; ;Write the new fi lter output y(n) into a (E)SFR. ; NOP
; (ACC) Min((ACC), MAX). ; (ACC) Max((ACC),MIN).
; Pipeline Effect.
72 - TCH - 170 - 00
47/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
MOV DAC_sfr, MAH ; move the new output y(n).
; ;Read d(n) and move it into a GPR. ; MOV ; ; SUB MOV CoMUL CoNEG CoSTORE R5, @d(n) ; (R5) d(n)
; Error, e(n), Calculation R5, @e(n), R5, R11, R6 R5 R8 rnd MAS ; (R5) d(n)-y(n)=e(n) ; e(n-1) e(n) ; (ACC) Mu*e(n) ; (ACC) -(ACC)+rnd ; (R11) -Mu*e(n).
; ; Coeffi cients' Updating ; MOV MOV MOV R12, IDX1, R3, IDX0 R9 #L-2 ; (R12) (IDX0) ; (IDX1) (R9) ; (R3) L
; ; Coeffi cient Update Prolog. ; CoLOAD [IDX1+QX1], [R10-] ; (ACC) h(L-1,n) ; (IDX1) (IDX1)+4. ; (R10) (R10)-2. CoMAC R11, [R12+] ; (ACC) h(L-1,n) ; Mu.e(n)*x(n-L+1)+rnd ; (R12) (R12)+2. CoSTORE CoSTORE [R10+], [R10+QR0], MAL MAH ; hL(L-1,n) hL(L-1,n+1). ; (R10) (R10)+2. ; hH(L-1,n) hH(L-1,n+1). ; (R10) (R10)+2. ; ; Coeffi cient Update Loop. ; EXT_LMS_LOOP: ; CoLOAD [IDX1+QX1], [R10-] ; (ACC) h(k,n) ; (IDX1) (IDX1)+4.
48/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; (R10) (R10)-2. CoMACM R11, [R12+] ; (ACC) h(k,n) -Mu.e(n)*x(n-k) ;+rn ; x(n-k-1) x(n-k). CoSTORE CoSTORE ; ; End_of_loop Checking. ; CMPD1 JMPR ; ; CoLOAD R3 cc_Z #0h ; (R3) (R3) -1.
EXT_LMS_LOOP ; End-of-Loop test & branch.
[R10+], [R10+QR0],
MAL MAH
; (R12) (R12)+2. ; hL(k,n) hL(k,n+1). ; (R10) (R10)+2. ; hH(k,n) hH(k,n+1). ; (R10) (R10)+2.
; Coeffi cient Update epilog. [IDX1+QX1], [R10-] ; (ACC) h(0,n) ; (IDX1) (IDX1)+4. CoMACM R11, [R12+] ; (R10) (R10)-2. ; (ACC) h(0,n) -Mu.e(n)*x(n) ;+rnd ; x(n-1) x(n). ; (R12) (R12)+2. CoSTORE CoSTORE [R10+], [R10-QR0], MAL MAH ; hL(0,n) hL(0,n+1). ; (R10) (R10)+2. ; hH(0,n) hH(0,n+1). ; (R10) (R10)-(2L-1). Instruction Cycles Read Input samples Initialization EXT LMS Loop Post/Pre -Processing Write Output sample Total 2 4 6L+2(L-2)+3 9 2 6L+2(L-2)+20 Program Words 4 8 39 16 4 71
72 - TCH - 170 - 00
49/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
6
6.1
Operations on Tables
Table move
This routine moves a table of L 16-bit data items from one memory location to another (where L is the number of data items). "Orig_Address" is the location of the first element of the table and "Dest_Address" is its location after the table move.
;
; MAC dedicated registers' initialization: ; MOV MRW, #L-1 EXTR #1
MOV ; IDX0, #Dst_Address
; (MRW) L-1. ; next instruction will ; utilize the ESFR space. ; (IDX0) Dst_Address.
; GPR initialization: ; MOV ; ; Move ; REPEAT MRW TIMES
R1,
#Orig_Address
; (R1) Orig_Address
CoMOV
[IDX0+],
[R1+]
; ((IDX0)) ((R1)) ; (IDX0) (IDX0)+2 ; (R1) (R1)+2.
Instruction Cycles Total L+4
Program Words 9
6.2
Find the index of a maximum value in a table
This routine finds the index of the maximum value of data x(i) for i=1 to L, contained in a table. The first element of the index is located at "Orig_Address". The operation is performed in two steps, the maximum value is detected, and then the corresponding index is detected. At the end of the routine, the maximum value is stored in the co-processor accumulator and the index is stored in R1 (GPR).
;
; MAC dedicated registers' initialization:
50/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
; MOV EXTR MRW, #1 #L-1
; (MRW) L-1. ; next instruction will ; utilize the ESFR space.
; ; Accumulator Initialization. ; MOV MAH,
#FFFFH
; (MAH) FFFFH, ; (MAE) FFH, ; (MAL) 0000H .
MOV ; ; GPRs initialization: ; MOV MOV ;
MAL,
#FFFFH
; (MAL) FFFFH,
R0, R1,
#0000H #Orig_Address
; (R0) 0000H ; (R1) Orig_Address
; First Iteration: Detection of the maximum value ; REPEAT MRW TIMES CoMAX R0,
; ; Re-initialization: ; MOV EXTR MOV MOV ; MRW, #1 IDX0, R1, #L-1
[R1+]
; (ACC) Max((ACC),x(i)) ; (R1) (R1)+2
#Dst_Address #Orig_Address
; (MRW) L-1. ; next instruction will ; utilize the ESFR space. ; (IDX0) Dst_Address. ; (R1) Orig_Address
; Second Iteration: Detection of the corresponding index ; REPEAT MRW TIMES CoCMP cc_EQU R0 [R1+]
; (MSW) (ACC)-x(i) ; (R1) (R1)+2
Program Words 22
Instruction Cycles Total 1. On average 3L/2+101
72 - TCH - 170 - 00
51/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
6.3
Compare for search
This routine finds the index of the fi rst piece of data in a table which matches a specified condition "cc_cond" when compared to the contents of the accumulator. It assumes that data is stored in numerical order in the table. The same assumptions are made as for Section 6.1 and Section 6.2. When a match is made, the index is stored in R1.
; ; Initialization: ; MOV MOV MOV MRW, R0, R1, #L-1 #0000H #Orig_Address
; (MRW) L-1. ; (R0) 0000H ; (R1) Orig_Address
; ; Accumulator Initialization. ; MOV MAH,
#data16
; (MAH) #data16, ; (MAE) 8 times (MAH15), ; (MAL) 0000H .
; ; Second Iteration: Detection of the corresponding index ;
REPEAT MRW TIMES
CoCMP
cc_GT
R0
[R1+]
; (MSW) (ACC)-x(i) ; (R1) (R1)+2
JNB MSW.12 ... NO_MATCH ...
NO_MATCH
; test C-fl ag of MSW and ; jump if no match
Instruction Cycles Total 1. On average L/2+91
Program Words 11
52/54
72 - TCH - 170 - 00
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
7
Summary of Routines
Instruction Cycles Program Words 19 24 28 24 5+4.URF 6 18 10 14 19 44 51+2(URF-1) 71+2(URF-1) 9 22 11 10 12 4N+8 N2+4N+7 2.N+2.N/URF1+2 L 2L+3 2N 2N+2 10N-1 13N-1 4L+2(L-2)/URF +1 6L+2(L-2)/URF +20 L+4 3L/2+106 L/2+78
Co-Processor Initialization Mathematics 32 by 32 signed multiplication Nth Order Power Series [NxN][Nx1] Matrix Multiply N-Real Multiply (Windowing) DSP Routines2 16x16 L-tap FIR 32x16 L-tap FIR DF13 Nth Order IIR fi lter DF24 Nth Order IIR fi lter DF2 N-cascaded Biquads TF5 N-cascaded Biquads 16x16 L-tap LMS 32x16 L-tap LMS Operations on Tables Table Move (L items) Find the Index of a Maximum Value in a table (L items) "Compare For Search"7 (L items) 1. "URF" stands for "UnRolling Factor". 2. Representative part of the routine only. 3. Direct Form 1 4. Direct Form 2 5. Transpose Form 6. On average
Table 1 Summary of routines
7. First data in a table that matches a specified condition. 8. On average
72 - TCH - 170 - 00
53/53
ST10 - DSP MAC SIGNAL PROCESSING ALGORITHMS
Information furnished is believed to be accurate and reliable. However, STMicroelectronics assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent or patent rights of STMicroelectronics. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. STMicroelectronics products are not authorized for use as critical components in life support devices or systems without express written approval of STMicroelectronics.
(R)
The ST logo is a trademark of STMicroelectronics (c) 1998 STMicroelectronics - All Rights Reserved STMicroelectronics GROUP OF COMPANIES Australia - Brazil - Canada - China - France - Germany - Italy - Japan - Korea - Malaysia - Malta - Mexico - Morocco The Netherlands - Singapore - Spain - Sweden - Switzerland - Taiwan - Thailand - United Kingdom - U.S.A.
54/54
72 - TCH - 170 - 00

▲Up To Search▲

Price & Availability of AN1099

	To Download AN1099 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .